In response to a severe lack of reporting within government sources, The Washington Post compiled a database of every fatal police shooting in the United States from 2015-2022. We are interested in exploring this data, then carrying out research about a specific variable in the data set: the use of body cameras during fatal shootings.
This exploratory data analysis is divided into five main parts: first, we organize the data; second, we we reshape the data for state- and region-based comparative analyses and build four new variables (police spending, laws that mandate body camera usage, political leaning (Republican or Democrat) in 2020, number of police officers); third, we ask a SMART research question about our data and attempt to answer this question; fourth, we will continue our research by asking a modeling SMART question and attempt to answer this question.
#To skip to the modeling part of this project, please scroll to line 1053, where part 5 starts.
First we call our packages. Then we read the data set that comes from a csv file called FPS22.csv.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ tibble 3.1.8 ✔ purrr 0.3.4
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ plotly::filter() masks dplyr::filter(), stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
After accounting for null values, the data set we are working with has 6,574 observations. Below we have provided a single sample observation:
| Name | Date | Manner of Death | Armed | Age | Gender | Race | City |
|---|---|---|---|---|---|---|---|
| Tim Elliot | 10/04/2022 | Shot | Gun | 53 | M | A | Shelton |
| State | Signs of Mental Illness | Threat Level | Flee | Body Camera | Longitude | Latitude | Is Geocoding Exact? |
|---|---|---|---|---|---|---|---|
| WA | 1 | TRUE | Not fleeing | FALSE | -123 | 47.2 | TRUE |
The total number of observations:
## [1] 5720
After pursuing the an exploratory data analysis, we decided to do some comparative analyses between states and regions to create a specific, measureable, achievable, relevant, and time-oriented research question to pursue for the remainder of the project.
To do this, we began by dividing the data into regions for easier visualization and comparative analysis. The regions divide each US state as follows:
| Northwest (NW) | Southwest (SW) | Midwest (MW) | Southeast (SE) | Northeast (NE) |
|---|---|---|---|---|
| California | New Mexico | Illinois | Georgia | New York |
| Washington | Arizona | Wisconsin | Alabama | Rhode Island |
| Oregon | Texas | Indiana | Mississippi | Maryland |
| Nevada | Oklahoma | Michigan | Louisiana | Vermont |
| Idaho | Hawaii | Minnesota | Tennessee | Pennsylvania |
| Utah | - | Missouri | North Carolina | Maine |
| Montana | - | Iowa | South Carolina | New Hampshire |
| Colorado | - | Kansas | Florida | New Jersey |
| Wyoming | - | North Dakota | Arkansas | Connecticut |
| Arkansas | - | South Dakota | West Virginia | Massachusetts |
| Arkansas | - | Nebraska | DC | - |
| - | - | Ohio | Virginia | - |
Fatal shootings in the Northwest United States:
## [1] 1551
Fatal shootings in the Southwest United States:
## [1] 1058
Fatal shootings in the Midwest United States:
## [1] 955
Fatal shootings in the Southeast United States:
## [1] 1668
Fatal shootings in the Northeast United States:
## [1] 488
To determine the likely causality of police body cameras turned on or off, we added several new variables. These will help us understand why there are differences between states in terms of whether body cameras are used during fatal police shootings. We will provide some sample data for each new variable.
The first new data set we add is state spending per capita on police for the year 2021.
Then we add a new binary variable that illustrates whether states have laws that mandate a police officer to use their body camera when interacting with members of the public. Not many states have this particular law, so only Maryland, New Jersey, New Mexico, and South Carolina are given the value 1, while all other states are given the value 0.
We add data on the political affiliation of a state. We used the direction the state swung in the 2020 election. That is, either a state voted from President Trump or for President Biden. Negative values indicate a swing towards Trump and positive values indicate a swing towards Biden.
Finally, we add a variable that looks at a state’s quantity of police officers per 100K citizens.
Example data points are shown below:
| State | Variable | Value |
|---|---|---|
| Alabama | Police Spending Per Capita | USD$477 |
| Maryland | Police Body Camera Laws | 1 |
| Iowa | 2020 US Presidential Election Vote | -8 |
| Florida | Police Quantity Per 100k Citizens | USD$477 |
We also created two sub-data sets by grouping the data by state and by region for visualization purposes. The contents of both groups are identical, besides their grouping.
Within our data set of 5,720 observations of police shootings from 2015 to 2022 in the United States, is there a correlation between the U.S. state of observation and whether a body camera was turned on during the shooting?
The state data subgroup can be summarized as follows:
## state month year regions
## Length:1763 Length:1763 Length:1763 MW:285
## Class :character Class :character Class :character NE:155
## Mode :character Mode :character Mode :character NW:503
## SE:478
## SW:342
##
## spendpc bclaw marg2020 le_per_100k stbcp
## Min. : 390 Min. :0.000 Min. :-43.0 Min. :284 Min. :0.000
## 1st Qu.: 547 1st Qu.:0.000 1st Qu.: -8.0 1st Qu.:378 1st Qu.:0.051
## Median : 633 Median :0.000 Median : 0.3 Median :438 Median :0.125
## Mean : 665 Mean :0.019 Mean : 2.7 Mean :442 Mean :0.113
## 3rd Qu.: 756 3rd Qu.:0.000 3rd Qu.: 19.0 3rd Qu.:479 3rd Qu.:0.146
## Max. :1337 Max. :1.000 Max. : 87.0 Max. :722 Max. :1.000
## gen.p smi.p flee.p att.p armed.p
## Min. :0.852 Min. :0.000 Min. :0 Min. :0.333 Min. :0.500
## 1st Qu.:0.941 1st Qu.:0.224 1st Qu.:0 1st Qu.:0.583 1st Qu.:0.877
## Median :0.960 Median :0.255 Median :0 Median :0.631 Median :0.917
## Mean :0.955 Mean :0.264 Mean :0 Mean :0.662 Mean :0.915
## 3rd Qu.:0.973 3rd Qu.:0.286 3rd Qu.:0 3rd Qu.:0.750 3rd Qu.:0.949
## Max. :1.000 Max. :1.000 Max. :0 Max. :1.000 Max. :1.000
## MoD.p age.avg Non_White_prop
## Min. :0.667 Min. :30.1 Min. :0.000
## 1st Qu.:0.914 1st Qu.:34.4 1st Qu.:0.364
## Median :0.928 Median :35.3 Median :0.529
## Mean :0.936 Mean :36.2 Mean :0.479
## 3rd Qu.:0.970 3rd Qu.:37.7 3rd Qu.:0.583
## Max. :1.000 Max. :53.7 Max. :1.000
The region data subgroup can be summarized as follows:
## state month year spendpc
## Length:1763 Length:1763 Length:1763 Min. : 390
## Class :character Class :character Class :character 1st Qu.: 547
## Mode :character Mode :character Mode :character Median : 633
## Mean : 665
## 3rd Qu.: 756
## Max. :1337
## bclaw marg2020 le_per_100k stbcp gen.p
## Min. :0.000 Min. :-43.0 Min. :284 Min. :0.000 Min. :0.852
## 1st Qu.:0.000 1st Qu.: -8.0 1st Qu.:378 1st Qu.:0.051 1st Qu.:0.941
## Median :0.000 Median : 0.3 Median :438 Median :0.125 Median :0.960
## Mean :0.019 Mean : 2.7 Mean :442 Mean :0.113 Mean :0.955
## 3rd Qu.:0.000 3rd Qu.: 19.0 3rd Qu.:479 3rd Qu.:0.146 3rd Qu.:0.973
## Max. :1.000 Max. : 87.0 Max. :722 Max. :1.000 Max. :1.000
## smi.p flee.p att.p armed.p MoD.p
## Min. :0.000 Min. :0 Min. :0.333 Min. :0.500 Min. :0.667
## 1st Qu.:0.224 1st Qu.:0 1st Qu.:0.583 1st Qu.:0.877 1st Qu.:0.914
## Median :0.255 Median :0 Median :0.631 Median :0.917 Median :0.928
## Mean :0.264 Mean :0 Mean :0.662 Mean :0.915 Mean :0.936
## 3rd Qu.:0.286 3rd Qu.:0 3rd Qu.:0.750 3rd Qu.:0.949 3rd Qu.:0.970
## Max. :1.000 Max. :0 Max. :1.000 Max. :1.000 Max. :1.000
## age.avg Non_White_prop
## Min. :30.1 Min. :0.000
## 1st Qu.:34.4 1st Qu.:0.364
## Median :35.3 Median :0.529
## Mean :36.2 Mean :0.479
## 3rd Qu.:37.7 3rd Qu.:0.583
## Max. :53.7 Max. :1.000
We will now check our data for normality:
Because the plot is relatively linear, we can conclude this data is close enough to normality for our purpose.
Now let us look at the body camera proportions by state. In the below bar graph, TRUE signifies a police body camera that was on, while FALSE indicates the body camera was off:
Number of fatal shootings where the body camera was on:
## body_camera n
## 1 TRUE 905
Number of fatal shootings where the body camera was off:
## body_camera n
## 1 FALSE 5383
This scatter plot shows the proportion of fatal shootings when cameras were on by state (the variable stbcp). Each point on the graph depicts a state’s proportion of shootings where the police body camera was turned on during the incident). We can see that there is very little variation in Southwest, and many differences among states in the Midwest.
Finally, let us check out the mean body camera on proportion for all states:
## [1] 0.113
And the stbcp median body camera on proportion for all states:
## [1] 0.125
We will now perform a chi-square test to see if there is a significant difference between the proportions of each state.
Null: There is no significant differences between US States in the proportion of body cameras being turned on during police shootings
Alternative: There is a significant difference between US State in the proportion of body cameras being turned on during police shootings
Significance Level: a = 0.05
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.051 0.125 0.113 0.146 1.000
##
## Pearson's Chi-squared test
##
## data: contable
## X-squared = 66994, df = 1900, p-value <2e-16
With a p-value of 2e-16, we easily pass our significance level of alpha=0.05 and have shown that there exists significant differences between different states’ proportions of body camera usage during fatal police shootings.
We now know there are significant differences in the level of body camera usage during police shootings among US states. Let us see if we can find out what drives those differences.
Our second SMART question: For the years 2021 and 2022, what variables influence a state’s proportion of body cameras turned on during fatal police shootings?
The variables we will study include:
US region
Law enforcement officers per 100,000 citizens
Law enforcement spending per capita
Body camera mandate laws
2020 presidential election voting
We will use multiple linear regression to build models that investigate whether any of these variables can be useful predictors of body cameras being turned on during fatal shootings in the United States.
Because most states’ body camera laws were enacted at the start of 2021, we will only look at data from 2021 and 2022. This reduces the number of cases in our original data set to 1,763.
First let us take a look at the new data set with its new variables (added in Part 3):
## # A tibble: 6 × 17
## # Groups: state [6]
## state month year regions spendpc bclaw marg2020 le_per_1…¹ stbcp gen.p smi.p
## <chr> <chr> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 WA 10 2022 NW 608 0 19 320. 0.0556 0.917 0.5
## 2 OR 10 2022 NW 736 0 16 284. 0.04 0.96 0.36
## 3 KS 10 2022 MW 553 0 -15 467. 0.125 0.938 0.188
## 4 CA 10 2022 NW 981 0 29 378. 0.146 0.944 0.255
## 5 CO 10 2022 NW 664 0 14 417. 0.0727 0.982 0.164
## 6 OK 10 2022 SW 487 0 -33 410. 0.128 1 0.277
## # … with 6 more variables: flee.p <dbl>, att.p <dbl>, armed.p <dbl>,
## # MoD.p <dbl>, age.avg <dbl>, Non_White_prop <dbl>, and abbreviated variable
## # name ¹le_per_100k
Number of observations:
## [1] 1763
The following figure depicts the relationship between body camera laws and stbcp.
The following figure depicts the relationship between law enforcement officers per 100K citizens and stbcp.
This figure depicts the relationship between the 2020 US presidential election margin and stbcp.
This figure depicts the relationship between the state spending on policing per capita and stbcp.
## Warning: Using size for a discrete variable is not advised.
This figure shows law enforcement officers per 100K citizens, grouped by region.
## Warning: Using size for a discrete variable is not advised.
The following plot compares the following variables: stbcp, 2020 election margin, and region.
Finally, here is a plot showing law enforcement officers per 100K citizens, police spending, and region.
Now that we are familiar with the data, we can start to model with our new state-wide data.
##
## Call:
## lm(formula = stbcp ~ (marg2020 + bclaw + regions + le_per_100k +
## spendpc), data = FD)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.1649 -0.0532 0.0032 0.0318 0.9610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.05e-01 1.53e-02 6.84 1.1e-11 ***
## marg2020 -6.23e-04 1.66e-04 -3.76 0.00018 ***
## bclaw -3.43e-02 1.44e-02 -2.38 0.01734 *
## regionsNE -4.59e-02 9.28e-03 -4.94 8.4e-07 ***
## regionsNW 3.64e-02 7.95e-03 4.58 4.9e-06 ***
## regionsSE 2.40e-02 6.44e-03 3.73 0.00020 ***
## regionsSW 5.96e-03 6.75e-03 0.88 0.37738
## le_per_100k -6.46e-05 3.67e-05 -1.76 0.07900 .
## spendpc 3.74e-05 2.01e-05 1.86 0.06262 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0787 on 1754 degrees of freedom
## Multiple R-squared: 0.114, Adjusted R-squared: 0.11
## F-statistic: 28.3 on 8 and 1754 DF, p-value: <2e-16
## GVIF Df GVIF^(1/(2*Df))
## marg2020 2.99 1 1.73
## bclaw 1.09 1 1.04
## regions 5.43 4 1.24
## le_per_100k 2.19 1 1.48
## spendpc 3.97 1 1.99
## res
## 1 -0.07591
## 2 -0.10046
## 3 0.02029
## 4 0.01029
## 5 -0.05770
## 6 0.00461
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The VIF values for model 1 are all within acceptable range.
With an R^2 of 0.183, this model is not very good at predicting statewide body camera usage. We can see that the region variable is not helpful so we will remove it.
##
## Call:
## lm(formula = stbcp ~ (marg2020 + bclaw + spendpc + le_per_100k),
## data = FD)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.1722 -0.0554 0.0078 0.0320 0.9106
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.29e-01 1.51e-02 8.54 < 2e-16 ***
## marg2020 -1.26e-03 1.53e-04 -8.21 4.3e-16 ***
## bclaw -1.76e-02 1.44e-02 -1.22 0.22
## spendpc 1.17e-04 1.62e-05 7.25 6.3e-13 ***
## le_per_100k -2.05e-04 2.57e-05 -7.97 2.7e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0807 on 1758 degrees of freedom
## Multiple R-squared: 0.0653, Adjusted R-squared: 0.0632
## F-statistic: 30.7 on 4 and 1758 DF, p-value: <2e-16
## marg2020 bclaw spendpc le_per_100k
## 2.42 1.03 2.45 1.02
## res
## 1 -0.05538
## 2 -0.09714
## 3 0.00778
## 4 0.01558
## 5 -0.03122
## 6 -0.01597
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The VIF values for model 2 are all within acceptable range.
With an R^2 of 0.152, this model is even worse at predicting statewide body camera usage.
##
## Call:
## lm(formula = stbcp ~ (marg2020 + bclaw + regions + le_per_100k +
## spendpc + I(spendpc * le_per_100k)), data = FD)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.1631 -0.0504 0.0022 0.0321 0.9585
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.59e-01 4.23e-02 3.77 0.00017 ***
## marg2020 -5.63e-04 1.71e-04 -3.29 0.00104 **
## bclaw -3.62e-02 1.45e-02 -2.50 0.01250 *
## regionsNE -4.72e-02 9.32e-03 -5.06 4.7e-07 ***
## regionsNW 3.88e-02 8.12e-03 4.77 2.0e-06 ***
## regionsSE 2.57e-02 6.54e-03 3.92 9.2e-05 ***
## regionsSW 7.81e-03 6.88e-03 1.14 0.25610
## le_per_100k -1.83e-04 9.28e-05 -1.97 0.04915 *
## spendpc -4.01e-05 5.94e-05 -0.68 0.49955
## I(spendpc * le_per_100k) 1.63e-07 1.17e-07 1.39 0.16579
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0786 on 1753 degrees of freedom
## Multiple R-squared: 0.115, Adjusted R-squared: 0.111
## F-statistic: 25.4 on 9 and 1753 DF, p-value: <2e-16
## res
## 1 -0.08069
## 2 -0.10175
## 3 0.02266
## 4 0.01201
## 5 -0.05976
## 6 0.00391
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
We are ignoring the VIF test for multicolinearity because we are using an interaction predictor.
With an R^2 of 0.276, this model is not good, much better than the others at predicting statewide body camera usage.
Since lm3 is our best model (per our R^2), lets try to predict a few made up US states:
Please notice Eleum and Faraam are identical except their body camera laws. This is the same for GW and HW.
Now let’s plug these new “states” into our model:
## fit lwr upr
## 1 0.0749 0.0439 0.106
## fit lwr upr
## 1 0.133 0.113 0.152
## fit lwr upr
## 1 0.00019 -0.032 0.0324
## fit lwr upr
## 1 0.144 0.104 0.184
## fit lwr upr
## 1 0.146 0.132 0.161
## fit lwr upr
## 1 0.11 0.0784 0.142
## fit lwr upr
## 1 0.106 0.0936 0.118
## fit lwr upr
## 1 0.0697 0.0407 0.0986
We can see the difference of fit on states Eleum and Faraam, as well as GW and HW.
Studying the use of body cameras in police work is an important topic of study for data-driven policy research in the United States. While we hoped to be able to associate this correlation between the U.S. state of observation and whether the body camera was on or off during the shooting to state policy on body cameras to some variable, we were unable to find a strong correlation. Although lm3 was our best model, it is still not a great predictor of statewide body camera usage, which can lead us to the following conclusions:
Both regional and state groupings demonstrated quantifiable differences in the proportion of body cameras turned on or off during fatal police shootings.
The number of law enforcement officers per capita does not influence whether a body camera is turned on or off.
State spending on policing per capita does not influence whether a body camera is turned on or off.
The political affiliation of a states does not influence whether a body camera is turned on or off.
Body camera mandate laws, present in Maryland, New Jersey, New Mexico, and South Carolina, slightly influence whether a body camera is turned on or off.
Considering body camera laws are relatively nascent, it will be an interesting topic of study to evaluate changes in stbcp as more states adopt such laws. This research project has shown that they may be the best chance states have of increasing camera usage during active police duty.